Term-Based Clustering and Summarization of Web Page Collections

نویسندگان

  • Yongzheng Zhang
  • A. Nur Zincir-Heywood
  • Evangelos E. Milios
چکیده

Effectively summarizing Web page collections becomes more and more critical as the amount of information continues to grow on the World Wide Web. A concise and meaningful summary of a Web page collection, which is generated automatically, can help Web users understand the essential topics and main contents covered in the collection quickly without spending much browsing time. However, automatically generating coherent summaries as good as human-authored summaries is a challenging task since Web page collections often contain diverse topics and contents. This research aims towards clustering of Web page collections using automatically extracted topical terms, and automatic summarization of the resulting clusters. We experiment with wordand term-based representations of Web documents and demonstrate that term-based clustering significantly outperforms word-based clustering with much lower dimensionality. The summaries of computed clusters are informative and meaningful, which indicates that clustering and summarization of large Web page collections is promising for alleviating the information overload problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشه‌بندی

With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...

متن کامل

Temporal Web Page Summarization

In the recent years the Web has become an important medium for communication and information storage. As this trend is predicted to continue, it is necessary to provide efficient solutions for retrieving and processing information found in WWW. In this paper we present a new method for temporal web page summarization based on trend and variance analysis. In the temporal summarization web docume...

متن کامل

Summarization of Dynamic Content in Web Collections

This paper describes a new research proposal of multi-document summarization of dynamic content in web pages. Much information is lost in the Web due to the temporal character of web documents. Therefore adapting summarization techniques to the web genre is a promising task. The aim of our research is to provide methods for summarizing volatile content retrieved from collections of topically re...

متن کامل

Temporal multi-page summarization

With the increasing popularity of the Web, efficient approaches to the information overload are becoming more necessary. Summarization of web pages aims at detecting the most important contents from pages so that a user can obtain a compact version of a web document or a group of pages. Traditionally, summaries are constructed on static snapshots of web pages. However, web pages are dynamic obj...

متن کامل

An Efficient Incremental Clustering based Summarization Technique for Web Page Classification

Currently the World Wide Web is the largest source of information. There are numerous self-acting classification advances that have been suggested. In this suggested work an effective incremental clustering approach to evolve a better incremental clustering based summarization method for world wide World Wide Web classification which can facilitate to better coordinate the accessible data on WW...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004